The Decoupled-Style Prefetch Architecture
نویسندگان
چکیده
Decoupled processing seeks to dynamically schedule memory accesses in order to tolerate memory latency by prefetching operands. Since decoupled processors can not speculatively issue memory operations, control flow operations can significantly impact their ability to prefetch data. The prefetching architecture proposed here seeks to leverage the dynamic scheduling benefits of decoupled processing while allowing memory operations to be speculatively invoked. The prefetching mechanism is evaluated using the SPEC95 suite of benchmarks and significant reductions in cache miss rate are achieved, resulting in speed-ups of over 40% of peak for most of the inputs.
منابع مشابه
Improving the Performance of Loop-Based Programs Using a Prefetch Processor
We present an architecture called the CAPP (Computing And Prefetching Processor). The CAPP provides high performance for loop-based scientific and signal processing programs by improving memory system performance by providing a decoupled prefetch processor. The prefetch processor improves performance by relieving the main processor of prefetching instruction overhead and allowing the prefetch d...
متن کاملSimplifying Hardware for Out Of Order Execution using the Decoupling Paradigm
Future hardware and software technology will try to provide improved performance by extracting higher levels of parallelism. However the cost of a main memory access-in terms of missed instruction issue slots-increases with faster processors and greater issue widths. For this reason latency hiding technology remains one of the most important parts of high performance processor designs. In this ...
متن کاملImproving the parallelism and concurrency in decoupled architectures
Concurrency between access and execution has been exploited by queues in many decoupled access-execute architectures, but data dependent control dependencies often prohibit prefetch-ing of data to queues. This paper investigates a technique to facilitate anticipatory loading to queues even in presence of data dependent control dependencies. The proposed method consists of fetching along one or ...
متن کاملDecoupled Sampling for Real-Time Graphics Pipelines
We propose decoupled sampling, an approach that decouples shading from visibility sampling in order to enable motion blur and depth-of-field at reduced cost. More generally, it enables extensions of modern real-time graphics pipelines that provide controllable shading rates to trade off quality for performance. It can be thought of as a generalization of GPU-style mul-tisample antialiasing (MSA...
متن کاملA Decoupled Fetch-Execute Engine with Static Branch Prediction Support
We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with su...
متن کامل